Visualizing Networks in Python

Python offers various libraries that provide different features for creating different types of visualizations. The Python Graph Gallery is a collection of hudreds of plots made with Python. There are the most commonly used libraries such as Matplotlib, Seaborn, and Plotly etc.

Today, I would like give you a simple guide of how to make network diagrams using Python.

In [16]:
# import library
import pandas as pd
import numpy as np

import plotly
import plotly.graph_objs as go
import plotly.express as px
import networkx as nx

import math
import matplotlib.pyplot as plt
import json

Install the libraries in the terminal using:

pip install package_name

or

conda install package_name

Displaying Plotly Charts on Jupyter Notebook

In order to use Plotly in JupyterLab, we would need to install a extension. Run the following command:

jupyter labextension install jupyterlab-plotly@4.13.0

If its still not showing anything, please refer to this page and the getting started doc

Displaying Plotly Charts in HTML

see this page

If you want to be able to shown plotly charts in HTML (offline), you need to add the plotly.js to the output html with: pio.renderers.default='notebook'

In [160]:
import plotly.io as pio
pio.renderers.default='notebook'
In [159]:
%%HTML
<script src="require.js"></script>
In [ ]:
plotly.offline.init_notebook_mode()

Data

The data used here is from one of my small project which can be download here, the topic of this project is the human trafficking crime in China. Click here to see more if you are interested.

In [2]:
clean_rescured_table = pd.read_csv('clean_rescured_table.csv')
In [3]:
clean_rescured_table.head()
Out[3]:
case_id reason age from to lost_time found_date
0 4492 put up for adoption 5 Henan Henan 27 2022
1 4491 trafficked 3 Heilongjiang Jiangsu 28 2022
2 4490 put up for adoption 0 Jiangsu Jiangsu 46 2022
3 4489 trafficked 5 Guizhou Fujian 18 2022
4 4488 trafficked 3 Sichuan Jiangsu 34 2022

We can create a edge list form the data by using 'groupby' and count the occurance of every unique edge.

In [4]:
from_to = clean_rescured_table.groupby(['from', 'to'])['from'].count()
from_to
Out[4]:
from      to       
Anhui     Anhui        21
          Fujian        3
          Guangdong     7
          Guangxi       1
          Hainan        2
                       ..
Zhejiang  Shandong     26
          Shanghai      2
          Shanxi        1
          Sichuan       1
          Zhejiang     18
Name: from, Length: 439, dtype: int64
In [5]:
# The index contains node pair of each unique edge
from_to.index
Out[5]:
MultiIndex([(   'Anhui',     'Anhui'),
            (   'Anhui',    'Fujian'),
            (   'Anhui', 'Guangdong'),
            (   'Anhui',   'Guangxi'),
            (   'Anhui',    'Hainan'),
            (   'Anhui',     'Hebei'),
            (   'Anhui',     'Henan'),
            (   'Anhui',     'Hubei'),
            (   'Anhui',     'Hunan'),
            (   'Anhui',   'Jiangsu'),
            ...
            ('Zhejiang',   'Guizhou'),
            ('Zhejiang',     'Hebei'),
            ('Zhejiang',     'Henan'),
            ('Zhejiang',   'Jiangsu'),
            ('Zhejiang',   'Jiangxi'),
            ('Zhejiang',  'Shandong'),
            ('Zhejiang',  'Shanghai'),
            ('Zhejiang',    'Shanxi'),
            ('Zhejiang',   'Sichuan'),
            ('Zhejiang',  'Zhejiang')],
           names=['from', 'to'], length=439)
In [6]:
# Create an empty dataframe with column names defined
from_to_df = pd.DataFrame(columns = ["from", "to", "count"])

# Append data from the grouped series to the empty dataframe
for i in range(len(from_to.index)):
    from_to_df.loc[i] = [from_to.index[i][0], from_to.index[i][1], from_to[i]]
In [7]:
# The final edge list
from_to_df
Out[7]:
from to count
0 Anhui Anhui 21
1 Anhui Fujian 3
2 Anhui Guangdong 7
3 Anhui Guangxi 1
4 Anhui Hainan 2
... ... ... ...
434 Zhejiang Shandong 26
435 Zhejiang Shanghai 2
436 Zhejiang Shanxi 1
437 Zhejiang Sichuan 1
438 Zhejiang Zhejiang 18

439 rows × 3 columns

Basic

Python package NetworkX allows you to create, manipulate, and study complex networks.

The most basic network can be draw using from_pandas_edgelist which returns a graph from Pandas dataFrame containing an edge list. The DataFrame should have at least two columns of nodes and zero or more columns of edge attributes. Each row represent one edge instance.

In [8]:
# Build the network
G = nx.from_pandas_edgelist(from_to_df, 'from', 'to')

# Draw the network
nx.draw(G, with_labels=True)
plt.show()

As you can alrealy see, the network we just built looks quite messy. Fortunately, we could use the arguments of the draw() function to custom the style and layout of our network diagram.We are allowed to make changes of:

  1. nodes, parameters linked here
  2. lables, parameter linked here
  3. edges, parameter linked here
In [9]:
# Set the graph size
fig, ax = plt.subplots(figsize=(10, 10))

# Draw network with Custom settings
nx.draw(G, 
        with_labels=True,   
        node_size=1000,             # default = 300
        node_color="skyblue",    # Can be string or rgb(a) tuple 
        node_shape="8",            # One of the ‘so^>v<dph8’
        alpha=0.7,                 # Transparency
        font_size = 10,
        font_color = 'black',
        font_weight = 'bold',
        edge_color = 'orange',
        width = 3,
        
        # Uncomment and try the below layout settings, which one would you perfer?
        
        #pos=nx.fruchterman_reingold_layout(G)
        pos=nx.circular_layout(G)
        #pos=nx.random_layout(G)
        #pos=nx.spectral_layout(G)
        #pos=nx.spring_layout(G)
       )

plt.show()

NetworkX offers G.degree() which measures the total number of edges connected to a particular vertex. This example shows the two common ways to visualize the distribution of the degree of nodes: a degree-rank plot and a degree histogram.

In [10]:
# Set the graph size
fig, ax = plt.subplots(1, 2, figsize=(10,5))

G = nx.from_pandas_edgelist(from_to_df, 'from', 'to')

degree_sequence = sorted((d for n, d in G.degree()), reverse=True)

ax[0].plot(degree_sequence, "b-", marker="o")

ax[1].bar(*np.unique(degree_sequence, return_counts=True))

plt.show()

Directed Network

It is very important to distinguish directed and undirected networks. NetworkX offers a function Digraph() for defining directed networks whereas another function Graph() is used for defining undirected networks.

In [11]:
# Build the directed network
G = nx.from_pandas_edgelist(from_to_df, 'from', 'to', create_using=nx.DiGraph())

# Set the graph size
fig, ax = plt.subplots(figsize=(10, 10))

# Draw network with Custom settings
nx.draw(G, 
        arrows=True,           # Add arrows to the network
        with_labels=True,   
        node_size=1000,            
        node_color="skyblue",     
        node_shape="8",           
        alpha=0.7,                 
        font_size = 10,
        font_color = 'black',
        font_weight = 'bold',
        edge_color = 'orange',
        width = 3,
        pos=nx.circular_layout(G)
   
       )

plt.show()

Map Variable to the Network

Is there a better way of showing weight of an edge with lable? Of course! We could map the weight to the width or color of the edge, whereas a deeper color or thicker line indicates higher correlation between nodes.

edge_cmap allows us to map a colormap to a variable, aviable opinions could be found here: https://matplotlib.org/3.5.0/tutorials/colors/colormaps.html. An example is: colormap

In [12]:
fig, ax = plt.subplots(figsize=(10, 10))

# Build your graph
G=nx.from_pandas_edgelist(from_to_df, 'from', 'to', create_using=nx.DiGraph())

# Draw network with Custom settings
nx.draw(G, 
        arrows=True,          
        with_labels=True,   
        node_size=1000,            
        node_color="skyblue",     
        node_shape="8",           
        alpha=0.7,                 
        font_size = 10,
        font_color = 'black',
        font_weight = 'bold',
        edge_color = from_to_df['count'],    # Map variable to edge color
        edge_cmap=plt.cm.Greys,              # Colormap for mapping intensities of edges
        width = 3,
        pos=nx.circular_layout(G)
       )

plt.show()

Notice, the when mapping the variable to edge width he value of the variable might be too big for the width, we could end up getting a network blocked by really thick edges. Therefore, we would want to scale the value to a certain range. There are many ways of doing so, the simplest way is to use log. Be careful, we need to use the log function from the numpy library rather than the math library (becaue math.log only expect one value and numpy.log can compute log for a sequence of number).

In [13]:
fig, ax = plt.subplots(figsize=(10, 10))

# Build your graph
G=nx.from_pandas_edgelist(from_to_df, 'from', 'to', create_using=nx.DiGraph())

# Draw network with Custom settings
nx.draw(G, 
        arrows=True,          
        with_labels=True,   
        node_size=1000,            
        node_color="skyblue",     
        node_shape="8",           
        alpha=0.7,                 
        font_size = 10,
        font_color = 'black',
        font_weight = 'bold',
        edge_color = 'orange',                 
        width = np.log(from_to_df['count']),   # Map varaible to edge width
        pos=nx.circular_layout(G)
       )

plt.show()

Much better!

It still looks we have too many edges showing. We could always filter the data to show only some specific connections. What if we want to keep all the edges but highlight links with significant weight? Could you think of a way of doing so?

Visualize Network on Basemap

Networks could be related to geographic locations. In our case, the nodes are provinces of China and the links are the number of human trafficking victims. Oftentimes, visualising such a network on a base map would be a good idea to provide direct information geographically.

The very first step: download GeoJSON-formated geometry information of the regions/countries we would like to draw.

You could find the GeoJSON file for most of the countires here

In [17]:
with open('china.json') as file:
    china = json.load(file)

The most basic map could be created by using plotly.express.choropleth():

  1. parameter geojson: input the GeoJSON file we just downloaded
  2. locations: name of the provinces
  3. color: we use range (33) as there are 33 provinces and we want to give each province a different color. We could also choose to map a variable to the colors.
  4. color_continuous_scale: a colormap
In [18]:
fig = px.choropleth(from_to_df, 
                    geojson=china, 
                    locations = from_to_df['from'].unique(),   
                    color = range(33),
                    color_continuous_scale="turbo",
                    basemap_visible = True,     
                   )

fig.update_geos(fitbounds="locations", visible=True)  # zoom the map to the locations we defined
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

......and if we want to map a variable to the colormap

In [19]:
# Create a dataframe of number of victims found in each province
to_df = clean_rescured_table.groupby(['to'])['to'].count().to_frame()
In [20]:
to_df.head(5)
Out[20]:
to
to
Anhui 138
Beijing 37
Chongqing 64
Fujian 617
Gansu 21
In [21]:
fig = px.choropleth(to_df, 
                    geojson=china, 
                    locations = to_df.index,   
                    color = to_df['to'],                # variable we want to present on the map
                    color_continuous_scale="Purples",
                    basemap_visible = True,     
                   )

fig.update_geos(fitbounds="locations", visible=True)  # zoom the map to the locations we defined
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

Lastly, we want to visualize the network on the map! The initial step is to define the longitude and latitude of our nodes (provinces). Since I did not find such a file (I found locations of cities in China, but each province has so many cities) and there are only 33 nodes, I just googled the location of each province and manually create a CSV.

In [22]:
# CSV I manually created
China_provinces  = pd.read_csv('China_Provinces.csv')

Then we create a base map, like what we have done before

In [23]:
fig = go.Figure()

# Create a base map
fig = px.choropleth(from_to_df, 
                    geojson=china, 
                    locations = from_to_df['from'].unique() ,
                    color = range(33),
                    basemap_visible = False,
                    color_continuous_scale="turbo"
                   )

Next, we draw the edges one by one in a loop. We are going to define the start and end nodes of each edge, and then define the lat and lon of each pair of start and end nodes. The core function go.Scattergeo visualize scatter point or lines on geographic map with provided lon/lat pairs

In [24]:
for i in range(len(from_to_df)):
    
    # Define the start and end nodes of a connection
    start = from_to_df['from'][i]
    end = from_to_df['to'][i]
    
    # Define the lat and lon of each pair of start and end nodes
    start_lon = China_provinces.loc[China_provinces['provinces'] == start]['lon'].item()
    end_lon = China_provinces.loc[China_provinces['provinces'] == end]['lon'].item()
    start_lat = China_provinces.loc[China_provinces['provinces'] == start]['lat'].item()
    end_lat = China_provinces.loc[China_provinces['provinces'] == end]['lat'].item()
    
    # Add the link between the nodes
    fig.add_trace(
        go.Scattergeo(
            lon = [start_lon, end_lon],
            lat = [start_lat, end_lat],
            mode = 'lines',
            line = dict(width = math.log(from_to_df['count'][i]) ,color = 'black'),
        )
    )

fig.update_geos(fitbounds="locations", visible=True)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.update_coloraxes(showscale=False)
fig.show()

and If you want add the nodes to the map as well...

In [25]:
fig.add_trace(go.Scattergeo(
lon = China_provinces['lon'],
lat = China_provinces['lat'],
hoverinfo = 'text',
text = China_provinces['provinces'],
mode = 'markers',
marker = dict(
    size = 7,
    color = 'white',
    line = dict(
        width = 3,
        color = 'rgba(68, 68, 68, 0)'
    )
)))
  
fig.update_geos(fitbounds="locations", visible=True)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.update_coloraxes(showscale=False)
fig.show()
In [158]:
 
In [ ]: